-
-
Notifications
You must be signed in to change notification settings - Fork 498
refactor: Introduce a TransmitBuf for poll_transmit #2168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Initial feedback from skimming:
|
f1b8fed to
fc2d121
Compare
|
I've attempted to split this up, hopefully the commits are more or less what you had in mind. Still makes the same change essentially. |
djc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, did a more detailed pass although I have not completely reviewed all the comments and code changes.
Despite your preferences, please stick to single spaces after periods and US English in this project. 👍
| /// This is also the maximum size datagrams are allowed to be. The first and last | ||
| /// datagram in a batch are allowed to be smaller however. After the first datagram the | ||
| /// segment size is clipped to the size of the first datagram. | ||
| pub(super) fn segment_size(&self) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess? I don't love the LoC overhead this relative to the benefit this provide over an implicit/documentation contract that callers shouldn't be mutating TransmitBuf's fields directly. It is a private API, of course, which we can trivially change over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's essentially a comment on this entire commit. I do feel fairly strongly that the encapsulation matters, in Rust this means you have to write accessors and that's fine for me. I don't think a few more lines of code are any overhead, it is even kind of nice you get to write a clear doc comment for it.
The encapsulation is a large part of this PR: the intricate logic of how all these fields relate to each other and the invariants that need to be upheld are now entirely encoded in the 15 lines of new_datagram_inner. And forcing non-mut access to the fields makes it easy to audit this.
quinn-proto/src/connection/mod.rs
Outdated
| /// given size. It allows the user of such a buffer to know the current cursor position in | ||
| /// the buffer. The maximum write size is usually passed in the same unit as | ||
| /// [`BufLen::len`]: bytes since the buffer start. | ||
| pub(crate) trait BufLen { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested, can we make BufLen: BufMut?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose not to do so because I don't think BufLen is that nice a trait. It is more of a plug to bridge the use of TransmitBuf and the existing concept of passing around a buffer with a bunch of offsets describing it. I hope that later it can be removed again, but also added this to limit the scope of this refactor.
And finally I think the naming would be much less clear, right now places get BufMut + BufLen and you immediately have very clear expectations. If it were to be a supertrait it'd be something like WriteBuffer, BufMutWithLen or I don't know. Naming it would become much trickier.
I do appreciate that this could become unwieldy as more traits are added. Though all the uses for now are just two traits, which I don't think is problematic yet.
15cd02d to
ad49388
Compare
|
Thank you for working on this! I think this should be considered to close #2121 |
|
Thanks for your patience--I have ambitions of trying to take a closer look at this next weekend. |
This applies the refactors from quinn-rs#2168 and quinn-rs#2195 onto our multipath branch!
gretchenfrage
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this is a good direction! Some thoughts so far...
Naming
First a minor thing: Perhaps this should be called TransmitBuilder rather than TransmitBuf? That way it matches PacketBuilder, and I feel that more accurately represents that that is tracking more complex state than just a byte sequence.
impl BufMut
With regard to this:
unsafe impl BufMut for TransmitBuf<'_> {
fn remaining_mut(&self) -> usize {
self.buf.remaining_mut()
}
unsafe fn advance_mut(&mut self, cnt: usize) {
self.buf.advance_mut(cnt);
}
fn chunk_mut(&mut self) -> &mut bytes::buf::UninitSlice {
self.buf.chunk_mut()
}
}I'm not sure that I like that. It adds an unsafe statement, as well as a handful of LOC, for the sake of being able to write buf instead of buf.buf in perhaps a dozen places. Plus, it's a slightly surprising thing to keep in mind.
trait BufLen
You're introducing this new trait for the sake of being able to pass around &mut (impl BufMut + BufLen) instead of &mut TransmitBuf. I understand that your motivation for this is to abstract things, but in this circumstance I think that directly passing around &mut TransmitBuf or &mut Vec<u8> to various functions would be a net-win for it being simple to think about.
This is related to my comment about impl BufMut, as creating a version of the "Do not pass buffer around as a Vec" commit which avoids relying on this trait would be easier if we're also not trying to make our code leverage an impl BufMut for TransmitBuf.
Getter methods
I agree with djc here--I'm pretty hesitant about having these getter methods on TransmitBuf: segment_size, num_datagrams, max_datagrams, datagram_start_offset, and datagram_max_offset.
These variables aren't really getting abstracted away if the users of TransmitBuf still have to know which of them exist and care about their specific values. That leaves the only remaining motivation of using getters like this as making sure we don't accidentally mutate them wrongly from outside the module. But it's pretty easy to check where we're writing to these variables and I don't think making these variables read-only from outside this module really gets at the bottleneck of us being able to make the code correct. And it has to be weighed against the cons of increased LOC, and us remembering which ones of these functions are simply reading a field versus which ones are doing something more transformative.
I was experimenting with an alternative way of trying to abstract some of these things, and I noticed that the num_datagrams value is used in similar ways in a few places repeatedly. So I tried to see if I could factor some of them away into more transformative methods on TransmitBuf until the num_datagrams value no longer needed to be accessed outside of TransmitBuf at all and I came up with this:
Diff
diff --git a/quinn-proto/src/connection/mod.rs b/quinn-proto/src/connection/mod.rs
index 3a39a433..62c42e73 100644
--- a/quinn-proto/src/connection/mod.rs
+++ b/quinn-proto/src/connection/mod.rs
@@ -562,7 +562,7 @@ impl Connection {
// We need to send 1 more datagram and extend the buffer for that.
// Is 1 more datagram allowed?
- if buf.num_datagrams() >= buf.max_datagrams() {
+ if !buf.more_datagrams_allowed() {
// No more datagrams allowed
break;
}
@@ -573,9 +573,7 @@ impl Connection {
// for starting another datagram. If there is any anti-amplification
// budget left, we always allow a full MTU to be sent
// (see https://github.com/quinn-rs/quinn/issues/1082)
- if self.path.anti_amplification_blocked(
- (buf.segment_size() * buf.num_datagrams()) as u64 + 1,
- ) {
+ if self.path.anti_amplification_blocked(buf.bytes_to_send()) {
trace!("blocked by anti-amplification");
break;
}
@@ -625,7 +623,7 @@ impl Connection {
builder.pad_to(MIN_INITIAL_SIZE);
}
- if buf.num_datagrams() > 1 {
+ if !buf.first_datagram() {
// If too many padding bytes would be required to continue the GSO batch
// after this packet, end the GSO batch here. Ensures that fixed-size frames
// with heterogeneous sizes (e.g. application datagrams) won't inadvertently
@@ -661,7 +659,7 @@ impl Connection {
builder.finish_and_track(now, self, sent_frames.take(), &mut buf);
- if buf.num_datagrams() == 1 {
+ if buf.first_datagram() {
buf.clip_datagram_size();
if space_id == SpaceId::Data {
// Now that we know the size of the first datagram, check
@@ -811,7 +809,7 @@ impl Connection {
// Send an off-path PATH_RESPONSE. Prioritized over on-path data to ensure that path
// validation can occur while the link is saturated.
- if space_id == SpaceId::Data && buf.num_datagrams() == 1 {
+ if space_id == SpaceId::Data && buf.first_datagram() {
if let Some((token, remote)) = self.path_responses.pop_off_path(self.path.remote) {
// `unwrap` guaranteed to succeed because `builder_storage` was populated just
// above.
@@ -900,7 +898,6 @@ impl Connection {
.mtud
.poll_transmit(now, self.packet_number_filter.peek(&self.spaces[space_id]))?;
- debug_assert_eq!(buf.num_datagrams(), 0);
buf.start_new_datagram_with_size(probe_size as usize);
debug_assert_eq!(buf.datagram_start_offset(), 0);
@@ -933,16 +930,7 @@ impl Connection {
return None;
}
- trace!(
- "sending {} bytes in {} datagrams",
- buf.len(),
- buf.num_datagrams()
- );
- self.path.total_sent = self.path.total_sent.saturating_add(buf.len() as u64);
-
- self.stats
- .udp_tx
- .on_sent(buf.num_datagrams() as u64, buf.len());
+ buf.update_stats(&mut self.path.total_sent, &mut self.stats.udp_tx);
Some(Transmit {
destination: self.path.remote,
@@ -952,9 +940,10 @@ impl Connection {
} else {
None
},
- segment_size: match buf.num_datagrams() {
- 1 => None,
- _ => Some(buf.segment_size()),
+ segment_size: if buf.first_datagram() {
+ None
+ } else {
+ Some(buf.segment_size())
},
src_ip: self.local_ip,
})
diff --git a/quinn-proto/src/connection/transmit_buf.rs b/quinn-proto/src/connection/transmit_buf.rs
index 61d3217c..71383945 100644
--- a/quinn-proto/src/connection/transmit_buf.rs
+++ b/quinn-proto/src/connection/transmit_buf.rs
@@ -1,4 +1,7 @@
use bytes::BufMut;
+use tracing::trace;
+
+use crate::{FrameStats, UdpStats};
use super::BufLen;
@@ -151,16 +154,27 @@ impl<'a> TransmitBuf<'a> {
self.segment_size
}
- /// Returns the number of datagrams written into the buffer
- ///
- /// The last datagram is not necessarily finished yet.
- pub(super) fn num_datagrams(&self) -> usize {
- self.num_datagrams
+ pub(super) fn more_datagrams_allowed(&self) -> bool {
+ self.num_datagrams < self.max_datagrams
+ }
+
+ pub(super) fn bytes_to_send(&self) -> u64 {
+ (self.segment_size * self.num_datagrams) as u64 + 1
}
- /// Returns the maximum number of datagrams allowed to be written into the buffer
- pub(super) fn max_datagrams(&self) -> usize {
- self.max_datagrams
+ pub(super) fn first_datagram(&self) -> bool {
+ self.num_datagrams == 1
+ }
+
+ pub(super) fn update_stats(&self, path_total_sent: &mut u64, udp_tx: &mut UdpStats) {
+ trace!(
+ "sending {} bytes in {} datagrams",
+ self.buf.len(),
+ self.num_datagrams,
+ );
+ *path_total_sent = path_total_sent.saturating_add(self.buf.len() as u64);
+
+ udp_tx.on_sent(self.num_datagrams as u64, self.buf.len());
}
/// Returns the start offset of the current datagram in the bufferBear in mind that there might be some flaws in that still at this point, but I think my intent should be clear. No need to incorporate this actual concept into this PR, but I think this approach to abstraction has certain advantages--it more successfully offloads conceptual complexity from the overly large poll_transmit method by extracting inherently meaningful sub-units of its logic into methods more_datagrams_allowed, bytes_to_send, first_datagram and update_stats--and at the end of this, the number of usages of num_datagrams and max_datagrams outside of transmit_buf.rs has dropped to zero and so we can just make them private / delete the getters as a reward.
Similarly, I'm inclined to leaving TransmitBuf.buf as pub(super) and thus removing is_empty, len, and as_mut_slice.
I don't have very strong feelings about the naming, this was really just a working name I thought would change but then just stuck as it did end up making sense. I think both work, but indeed
What would you think of: impl TransmitBuf<'_> {
/// Returns a buffer into which the datagram can be written
pub(super) fn datagram_mut(&mut self) -> bytes::buf::Limit<&mut Vec<u8>> {
self.buf.limit(self.buf_capacity)
}
}This is very similar to what I did in #2195 which has a (I'd like to return return a generic
I did this trait for two reasons:
So I think if you receive an explicit I think it might be possible to remove
After #2195 there is only one
This last sentence I am really not that sure about. Abstractions the compiler provides are great at ensuring correctness, having to manually check on each PR if something is being mutated that shouldn't be is very fallible. Having explicit contracts like this will make it clear if that is being changed.
Yes, I agree there may be more scope here. And I was unsure whether to go further or not. For the sake of being more incremental with changes I stopped here for this PR though. For example I think the So, from what I understand maybe next steps for this PR could be:
If you think that's helpful I'll give those changes a go. |
|
@gretchenfrage I've pushed some new commits. Mainly this finally introduces a buffer abstraction that I've long wanted but resisted creating: the This means the The main reason I was trying to avoid these changes so far was to try and keep the changes smaller. The abstractions are a fair number of lines of code again, but have a straight forward interface. |
|
FWIW, It does leave a manual |
|
Also note that |
|
I'm still looking at this, but for what it's worth:
I don't believe that's true. Maybe I misunderstood? But checking out f2ecc76 and applying this diff seems to work just fine: diff --git a/quinn-proto/src/connection/transmit_buf.rs b/quinn-proto/src/connection/transmit_buf.rs
index 31fe7e00..b429073d 100644
--- a/quinn-proto/src/connection/transmit_buf.rs
+++ b/quinn-proto/src/connection/transmit_buf.rs
@@ -141,7 +141,7 @@ impl<'a> TransmitBuf<'a> {
}
/// Returns a buffer into which the current datagram can be written
- pub(super) fn datagram_mut(&mut self) -> bytes::buf::Limit<&mut Vec<u8>> {
+ pub(super) fn datagram_mut<'s>(&'s mut self) -> impl BufMut + super::BufLen + 's {
self.buf.limit(self.buf_capacity)
} |
Can you expand on why not? From looking at it, it actually seems like it might be fine.
Notable source code bits exemplifying the expected usage of this:
The comment "This value is greater than or equal to the length of the slice returned by chunk_mut()" clarifies that it's not merely meant as the remaining length of the next chunk, and the If |
Oh, interesting. Thanks for pointing this out. Though in 4b0dcbf I now change that return type to |
|
Feedback from skimming the current state:
I think it's definitely worth getting the commit history in shape here. |
4f5f96f to
6b7df89
Compare
I really would like to keep the access to fields read-only, and rust only allows this by using getter methods. When unfamiliar with the code, having everything access everything and mutate from various places make it rather hard to understand what is going on at times. It also results in logic about fields of structs being encoded in surprising places because everywhere can access it. I understand that originally this was convenient when first writing Quinn, but also think that it is fine to be more restrictive with the abstractions now.
Yes, I was exploring this a bit more to see what can be done. This is basically an artifact of how the code was structured before. Having this code abstracted in one place makes it more obvious that this is a bit of a soar point. I think it would totally make sense to further refactor this. E.g. I can easily imagine inverting the api calls from "start new datagram" to |
I understand this, but I think we can rely on callers in the limited visibility area to respect this if you just document it. Getters are just not worth it in this case. |
|
Hi, just letting you know I'm not forgetting about this PR. Am still waiting for a round of feedback from @gretchenfrage before doing followup changes. |
gretchenfrage
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience. First off, some things I want to highlight as positives:
- Your clarification of the code comments describing the logic of some of this code is appreciated.
- Renaming to
TransmitBuilderseems good.
Some of the comments I attached here I believe are things you fix in later commits, and thus might not be relevant. But yes, as djc mentioned, it would be helpful for you to craft the PR commit history in a way that incorporates changes to commits into those commits, rather than by changing them with additional commits added to the end, because doing so meshes better with the git usage style of this project.
But yeah I like the direction where this is going.
quinn-proto/src/connection/mod.rs
Outdated
| if self | ||
| .path | ||
| .anti_amplification_blocked(segment_size as u64 * (num_datagrams as u64) + 1) | ||
| .anti_amplification_blocked((buf.segment_size * buf.num_datagrams) as u64 + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Factoring out the cast is not a pure refactor. Consider this snippet, wherein the first debug line succeeds, but the experiences overflow:
#![allow(arithmetic_overflow)]
fn main () {
const N: u32 = 2_000_000;
dbg!((N as u64) * (N as u64));
dbg!((N * N) as u64);
}Do we have reason to believe that's not a problem here?
| } | ||
| } | ||
|
|
||
| unsafe impl BufMut for TransmitBuf<'_> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm uncertain about whether it's better to have this delegate implementation of BufMut, or to simply pass in &mut buf.buf rather than &mut buf to methods that expect an impl BufMut.
quinn-proto/src/connection/mod.rs
Outdated
| now: Instant, | ||
| space_id: SpaceId, | ||
| buf: &mut Vec<u8>, | ||
| buf: &mut (impl BufMut + BufLen), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there were already a single externally defined trait that could be fully sufficient for replacing &mut Vec<u8>/&mut TransmitBuf with &mut (impl ThatTrait), I would enjoy doing so. But in this case where 1. we are taking on the complexity of defining an additional BufLen trait and its impl blocks 2. we are using compound &mut (impl X + Y) blocks to utilize it and 3. these methods are nevertheless only called with a one implementation, I'm don't really think it's worth it compared to just passing &mut Vec<u8> or &mut TransmitBuf.
| self.buf.as_mut_slice() | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easier to review with our development workflow if you edited your PR history to undo this addition rather than including a commit that adds these lines and a subsequent one that removes them
|
|
||
| unsafe impl BufMut for TransmitBuf<'_> { | ||
| fn remaining_mut(&self) -> usize { | ||
| self.buf.remaining_mut() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on djc's comments on not being fan of the trivial getters for fully internal things
Just a quick drive-by note -- it's not (just) that this is the style, but concretely it makes reviewing larger PRs much easier, since each commit can be reviewed in isolation. |
Well, if a project utilizes a "never rebase" workflow, that can make it easier to see what has changed about the PR since last review |
|
Which is great for the brief period where you can exactly remember what happened last review 😁 |
This TransmitBuilder is a wrapper around the buffer in which datagrams are being created. It keeps track of the state required to know the boundaries of the datagrams in the buffer.
This removes the extra arguments from Packetbuilder::new that tell the builder about datagram boundaries in favour of using the state of TransmitBuilder directly.
This moves the logic of updating the buffer for new datagrams into a function on the TransmitBuilder. This centralises all the manipulation of these variables into one logical place, ensuring all the invariants are upheld.
This helps encapsulation by ensuring that no outside users can mutate these fields. Reducing the amount of logic that needs to be reasoned about.
We want to hide the actual buffer implementation from most of these locations. Currently it is a TransmitBuilder but it likely will become something else in the future. So make this generic.
This allows making TransmitBuilder::buf private at last. Now all the logic it handles is fully encapsulated.
BufMut is no longer directly implemented on TransmitBuilder, instead you need to call TransmitBuilder::datagram_mut to get a BufMut for the current datagram.
The TransmitBuilder now only allows access to the current datagram instead of to the entire buffer. This buffer is implemented as the DatagramBuffer struct. It is defined by the BufSlice trait however: code paths outside of poll_transmit also encode QUIC packets into buffers and they use a simple Vec buffer. This trait could potentially later be useful for other buffers as well, e.g. a packet buffer. Most importantly the logic of the minimum and maximum datagra size in the PacketBuilder had to be adjusted to track offsets relative to the datagram instead of relative to the entire buffer.
This completely removes the need for TransmitBuilder::datagram_start_offset and ::datagram_max_offset, now that everything can work with the buffer returned by TransmitBuilder::datagram_mut(). It also removes a now unused field in the packet builder since it no longer needs to keep track of the start offset of the datagram: it is always 0 now.
This uses the DatagramBuffer type directly, in favour of having the BufSlice trait that is only used once. This meanse Header::encode now takes &mut DatagramBuffer and all the callers have to construct this. Previously they used `&mut Vec<u8>` for this. The DatagramBuffer type is still kept crate-internal. Endpoint::handle and friends still use `&mut Vec<u8>` for the buffer in which to build the response packet, to not change the public API.
When writing frames into packets it needs to be known how much space there is in the packets. This used to be done using a max offset into a larger buffer. This now switches this round to use BufMut::remaining_mut, which makes accessing this information easier and also removes carrying this around as an extra parameter.
The PacketBuilder no longer has any business with the TransmitBuilder, it can do everything with just the DatagramBuffer.
This makes the code a bit more readable.
Replaces manually checking for number of allowed datagrams.
6b7df89 to
6ded916
Compare
|
(Still massaging commits to take feedback into account, these newly pushed commits are not yet ready for a round of review. Will re-request review once ready again.) |
Hi, I've been looking around in
poll_receiverecently and started wondering if it could be simplified. This is my first attempt at a step in this direction, it pulls together the state about datagrams and their boundaries in the write buffer. I think this helps understand the interactions between a bunch of state changes that were previously spread across many lines. See the commit message for a detailed description.If the general direction is considered useful I think I'd like to continue this further. The next thing I'd probably look at is the buffer management for a packet. I expect a lot of places don't need to know much about the full transmit buffer they're writing into but rather need to know the (start, len, end) space they can write their frames. Probably reducing the need to do everything with offsets from the transmit buffer.
Anyway, my aim is the make things clearer and easier to manage. Let's first see if this is step in that direction and then see what the future steps turn out to be. I think any change like this should be able to stand on its own rather than rely on a future change, because the future is always uncertain and might never happen :)